home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Fritz: All Fritz
/
All Fritz.zip
/
All Fritz
/
FILES
/
WORDMISC
/
PCINDX11.LZH
/
PCINDX.EXE
/
HELP.003
< prev
next >
Wrap
Text File
|
1991-08-20
|
7KB
|
143 lines
Extracting Personal Names
This menu selection is new to this version of PC─INDEX. Extract
Personal Names will go through a document finding personal names,
first and last names and writing them out to a phrase file. This
file can then be used to create a name index or merged with
another phrase file to create a more comprehensive index that
includes names.
This selection is not guaranteed to find all names in a document,
but it is a good starting point. Usually this option will extract
capitalized words that are not really names rather than omit
names.
In order to use this option correctly, it will be helpful to
understand what is happening. PC─INDEX scans a document until it
finds at least two capitalized words in a row. If two
capitalized words are found, then the first word is looked up in
the Personal Name File. If the name is found then this sequence
of capitalized words is assumed to be a person's name.
The Personal Name File contains over 12,000 first names. You may
want to browse through the list using the Edit Personal Name File
(found in the Edit List Menu) to make sure that it contains names
you know you need.
When you select Extract Personal Names, you will see a screen
asking you for an Input File Name, an Output File Name, the
Maximum Number of Words in a Name, and information regarding the
surname (last name).
For the input file name enter the name of the document you want
to extract names from. For the output file name enter any name
you want. It is recommended that you use a file name with the
extension '.dbf'.
The maximum number of words in a name can be any number from 2 to
6. There must be at least 2 words in a name (a first and last
name) and no more than 6. In any case, the total number of
characters in a name must be 70 or less. For this example enter
3 for the Maximum Number of Words in a Name.
The last three choices tell PC─INDEX how last names can be
recognized. These choices were added to help PC─INDEX to find
names faster and more accurately.
The fastest and most accurate method for extracting names is Last
Name contains ALL CAPS. In order to use this option, all
surnames must contain all capital letters and names that are not
surnames cannot contain all caps. If it isn't possible to use
all caps in last names then use one of the other options. If it
doesn't matter to you whether last names are all caps or not,
then it is recommended that you use all caps. The increase in
speed and accuracy will be significant.
The next option, Last Name is not ALL CAPS tells PC─INDEX that no
names will contain only capital letters. This is the second
fastest and second most accurate method for extracting names.
The last option, Last Name may or may not be ALL CAPS should be
selected if the way capital letters used in names is not
consistent.
For this example select Last Name contains ALL CAPS.
The completed screen should look something like this:
┌───────────────────────────────────────────────────────┐
│ Input File Name: (Name of Document to process) │
│ pci.doc │
│ │
│ Output File Name: │
│ pcinames.dbf │
│ │
│ Maximum Number of Words in a Name (2 ─ 6) │
│ 3 │
│ │
│ X Last Name is ALL CAPS │
│ │
│ Last Name is not ALL CAPS │
│ │
│ Last Name may or may not be ALL CAPS │
└───────────────────────────────────────────────────────┘
When you have finished entering the filenames and other
information, press F10 to begin processing.
You should see a status box which tells you the number of words
to be processed, the number of words actually processed, the
number of names found, percentage completed, and the elapsed
time.
After this is complete, browse through the names that were just
extracted by selecting Edit Extracted Name File from the Edit
List Menu. This will allow you to correct names if necessary, to
delete entries completely, or to manually add names to the list.
If you are following the entries in this example, the Extracted
Name File should look like this:
┌───────────────────────────────────────────────────────────────┐
│ ┌──────────────────── Edit Phrase List ────────────────┐ │
│ │ │ │
│ │ BENSON │ │
│ │ BENSON │ │
│ │ BENSON │ │
│ │ BENSON │ │
│ │ WILLIAMS │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────── Display Complete Phrase ───────────────┐ │
│ │ BENSON │ │
│ │ Brian │ │
│ │ Brian BENSON │ │
│ └────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
You may want to merge the extracted name file with a phrase file
so an index will contain both names and phrases. Since the
extracted name file is actually a phrase file, you can use Merge
Phrase Files (found in the Merge Files Menu) to accomplish this.
You may notice that one entry lists the name Brian Brian BENSON.
This is not really a mistake. If you look at page13 (as well as
the example above) you will see that the name Brian appears twice
before BENSON. PC─INDEX makes no attempt to find possible
mistakes, it only finds sequences of names. This is one example
why you need to edit the extracted name list before you create an
index.
If you want to merge a name file with a phrase file use
pcinames.dbf as the Input Merge File Name and phrase.dbf as the
Output Merge File Name. After performing this step, all
extracted names will be in the standard phrase file.
If you only have a few names in your document, you may want to
consider adding them manually to your phrase file.